Add GENIE attacks (model extraction + pruning) with PyG fallback, GCN link predictor, and demo examples #25

Sparshkhare1306 · 2025-09-24T21:59:46Z

📋 Summary

Describe the purpose of this PR and the changes introduced

This PR ports a GENIE-like model extraction and pruning attack into the PyGIP codebase and adds a small PyTorch-Geometric (PyG) fallback so the project can run smoke demos on machines that do not have a working DGL installation. The goal is to make it straightforward for maintainers to run a quick demonstration of the GENIE attacks inside the PyGIP repo.

Files added / changed (high-level)

attacks/genie_model_extraction.py — GENIE-style model extraction attack, adapted to the repo BaseAttack usage.
attacks/genie_pruning_attack.py — Pruning attack adapted to the repo BaseAttack.
models/gcn_link_predictor.py — Minimal GCN link-prediction model used by the attacks and the demo trainer.
pygip/datasets/datasets.py and pygip/datasets/__init__.py — PyG-only fallback dataset loaders: load_ca_hepth, load_c_elegans, and a SimpleDataset wrapper.
examples/train_small_predictor.py — Small trainer that saves a demo checkpoint (examples/watermarked_model_demo.pth).
examples/run_genie_experiments.py — Example script that runs extraction then pruning and prints metrics.

🧪 Related Issues

Port of GENIE attacks requested for PyGIP integration (no issue number currently; please link if there is an existing issue).
This PR is intended to satisfy the request to adapt GENIE-style attack code into the PyGIP format for review and smoke-testing.

✅ Checklist

My code follows the project's coding style (best effort; please flag any style issues in review).
I have tested the changes and verified that they work (smoke tests run locally; see "How to verify" below).
I have added necessary documentation (I added inline comments and example scripts; please advise if you want docs in docs/).
I have linked related issues above (none existed; please add if applicable).
The PR is made from a feature branch (feat/genie-watermark-ft), not main.

🧠 Additional Context (Important — please read)

Quick reproduction steps (exact commands)

From the repository root:

Make repo importable (temporary):

export PYTHONPATH="$(pwd):$PYTHONPATH"

(Optional) Install editable package for consistent imports:

pip install -e .

Train the small demo teacher (creates examples/watermarked_model_demo.pth):

python examples/train_small_predictor.py

This prints training loss and writes examples/watermarked_model_demo.pth

Run the GENIE-style demo (extraction + pruning):

python examples/run_genie_experiments.py --dataset CA-HepTh --model_path examples/watermarked_model_demo.pth

Expected lines in output:

Loaded dataset CA-HepTh: nodes=9877 feat_dim=64
[GenieModelExtraction] Running on device cpu
Extraction result: {'dataset': 'CA-HepTh', 'query_ratio': 0.05, 'surrogate_test_auc': <number>}
Pruning result: {'dataset': 'CA-HepTh', 'prune_ratio': 0.2, 'test_auc': <number>, 'watermark_auc': <maybe None>}

What I observed during local testing (so reviewers know what to expect)

If no checkpoint is provided, extraction AUC is ~0.5 (random), pruning AUC also ~0.5 — expected since teacher is untrained.
After training the small demo teacher (examples/train_small_predictor.py) and supplying that checkpoint:
- surrogate_test_auc can increase (example observed ≈ 0.70 on the tiny demo teacher).
- test_auc after pruning can be significantly >0.5 depending on the demo checkpoint (observed ≈ 0.79 during local runs).
The current implementation is a smoke/demo implementation — it is not a full, large-scale reproduction of the GENIE paper experiments (no large hyperparameter sweeps, multiple seeds, or large dataset jobs included).

Important limitations & notes

This PR adds a PyG fallback so reviewers who cannot install DGL can run the demo. DGL codepaths remain in the repository; if DGL is available, you can still use them.
Checkpoint loading is “best effort.” If you feed a checkpoint from a different model definition, adapt models/gcn_link_predictor.py loader to match your keys.
The ported attack code aims for clarity and compatibility with PyGIP’s BaseAttack API — it preserves algorithmic intent but is simplified for readability and reproducible smoke runs.

Add GENIE robustness experiments (model extraction + pruning)

707db61

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GENIE attacks (model extraction + pruning) with PyG fallback, GCN link predictor, and demo examples #25

Add GENIE attacks (model extraction + pruning) with PyG fallback, GCN link predictor, and demo examples #25

Uh oh!

Sparshkhare1306 commented Sep 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add GENIE attacks (model extraction + pruning) with PyG fallback, GCN link predictor, and demo examples #25

Are you sure you want to change the base?

Add GENIE attacks (model extraction + pruning) with PyG fallback, GCN link predictor, and demo examples #25

Uh oh!

Conversation

Sparshkhare1306 commented Sep 24, 2025

📋 Summary

🧪 Related Issues

✅ Checklist

🧠 Additional Context (Important — please read)

Quick reproduction steps (exact commands)

This prints training loss and writes examples/watermarked_model_demo.pth

What I observed during local testing (so reviewers know what to expect)

Important limitations & notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant